Overview

Dataset statistics

Number of variables16
Number of observations11684068
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory5.0 GiB
Average record size in memory459.2 B

Variable types

Categorical6
Numeric9
DateTime1

Alerts

InventoryId has a high cardinality: 203508 distinct values High cardinality
Description has a high cardinality: 7215 distinct values High cardinality
VendorName has a high cardinality: 118 distinct values High cardinality
City has a high cardinality: 67 distinct values High cardinality
Brand is highly correlated with Size and 1 other fieldsHigh correlation
SalesQuantity is highly correlated with SalesDollars and 1 other fieldsHigh correlation
SalesDollars is highly correlated with SalesQuantity and 1 other fieldsHigh correlation
SalesPrice is highly correlated with SalesDollarsHigh correlation
Volume is highly correlated with SizeHigh correlation
Classification is highly correlated with Brand and 1 other fieldsHigh correlation
ExciseTax is highly correlated with SalesQuantity and 1 other fieldsHigh correlation
Size is highly correlated with Brand and 2 other fieldsHigh correlation
Store is highly correlated with CityHigh correlation
City is highly correlated with StoreHigh correlation
SalesQuantity is highly skewed (γ1 = 26.01507302) Skewed
SalesDollars is highly skewed (γ1 = 39.84989273) Skewed
SalesPrice is highly skewed (γ1 = 42.4717086) Skewed
ExciseTax is highly skewed (γ1 = 29.34954166) Skewed

Reproduction

Analysis started2022-10-05 01:56:04.206800
Analysis finished2022-10-05 02:10:29.626483
Duration14 minutes and 25.42 seconds
Software versionpandas-profiling v3.3.0
Download configurationconfig.json

Variables

InventoryId
Categorical

HIGH CARDINALITY

Distinct203508
Distinct (%)1.7%
Missing0
Missing (%)0.0%
Memory size913.6 MiB
68_SOLARIS_5270
 
344
13_TARMSWORTH_8064
 
341
35_HALIVAARA_4157
 
339
35_HALIVAARA_4135
 
339
59_CLAETHORPES_3606
 
336
Other values (203503)
11682369 

Length

Max length22
Median length20
Mean length16.99202743
Min length10

Characters and Unicode

Total characters198536004
Distinct characters35
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5356 ?
Unique (%)< 0.1%

Sample

1st row1_HARDERSFIELD_1004
2nd row1_HARDERSFIELD_1004
3rd row1_HARDERSFIELD_1004
4th row1_HARDERSFIELD_1004
5th row1_HARDERSFIELD_1004

Common Values

ValueCountFrequency (%)
68_SOLARIS_5270344
 
< 0.1%
13_TARMSWORTH_8064341
 
< 0.1%
35_HALIVAARA_4157339
 
< 0.1%
35_HALIVAARA_4135339
 
< 0.1%
59_CLAETHORPES_3606336
 
< 0.1%
39_EASTHALLOW_8111336
 
< 0.1%
49_GARIGILL_8184334
 
< 0.1%
18_FURNESS_1892333
 
< 0.1%
13_TARMSWORTH_3837332
 
< 0.1%
13_TARMSWORTH_3606330
 
< 0.1%
Other values (203498)11680704
> 99.9%

Length

2022-10-05T09:10:29.829483image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
55_dry229503
 
1.9%
56_beggar's143420
 
1.2%
42_black94665
 
0.8%
47_pella's87608
 
0.7%
26_knife's25453
 
0.2%
68_solaris_5270344
 
< 0.1%
13_tarmsworth_8064341
 
< 0.1%
35_halivaara_4157339
 
< 0.1%
35_halivaara_4135339
 
< 0.1%
39_easthallow_8111336
 
< 0.1%
Other values (203503)11682369
95.3%

Most occurring characters

ValueCountFrequency (%)
_23368136
 
11.8%
E12492906
 
6.3%
R10407731
 
5.2%
39859820
 
5.0%
N9303426
 
4.7%
A8837376
 
4.5%
18422395
 
4.2%
28217364
 
4.1%
67963798
 
4.0%
47956470
 
4.0%
Other values (25)91706582
46.2%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter101943978
51.3%
Decimal Number72386760
36.5%
Connector Punctuation23368136
 
11.8%
Space Separator580649
 
0.3%
Other Punctuation256481
 
0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
E12492906
12.3%
R10407731
 
10.2%
N9303426
 
9.1%
A8837376
 
8.7%
O6906316
 
6.8%
L6549263
 
6.4%
S6497537
 
6.4%
T5722509
 
5.6%
D4414554
 
4.3%
C4182380
 
4.1%
Other values (12)26629980
26.1%
Decimal Number
ValueCountFrequency (%)
39859820
13.6%
18422395
11.6%
28217364
11.4%
67963798
11.0%
47956470
11.0%
57388284
10.2%
76916425
9.6%
86446572
8.9%
94888778
6.8%
04326854
6.0%
Connector Punctuation
ValueCountFrequency (%)
_23368136
100.0%
Space Separator
ValueCountFrequency (%)
580649
100.0%
Other Punctuation
ValueCountFrequency (%)
'256481
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin101943978
51.3%
Common96592026
48.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
E12492906
12.3%
R10407731
 
10.2%
N9303426
 
9.1%
A8837376
 
8.7%
O6906316
 
6.8%
L6549263
 
6.4%
S6497537
 
6.4%
T5722509
 
5.6%
D4414554
 
4.3%
C4182380
 
4.1%
Other values (12)26629980
26.1%
Common
ValueCountFrequency (%)
_23368136
24.2%
39859820
10.2%
18422395
 
8.7%
28217364
 
8.5%
67963798
 
8.2%
47956470
 
8.2%
57388284
 
7.6%
76916425
 
7.2%
86446572
 
6.7%
94888778
 
5.1%
Other values (3)5163984
 
5.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII198536004
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
_23368136
 
11.8%
E12492906
 
6.3%
R10407731
 
5.2%
39859820
 
5.0%
N9303426
 
4.7%
A8837376
 
4.5%
18422395
 
4.2%
28217364
 
4.1%
67963798
 
4.0%
47956470
 
4.0%
Other values (25)91706582
46.2%

Store
Real number (ℝ≥0)

HIGH CORRELATION

Distinct79
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean43.58414561
Minimum1
Maximum79
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size178.3 MiB
2022-10-05T09:10:30.006483image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile6
Q123
median46
Q366
95-th percentile76
Maximum79
Range78
Interquartile range (IQR)43

Descriptive statistics

Standard deviation23.54349106
Coefficient of variation (CV)0.540184756
Kurtosis-1.251263548
Mean43.58414561
Median Absolute Deviation (MAD)21
Skewness-0.2055342007
Sum509240121
Variance554.2959713
MonotonicityNot monotonic
2022-10-05T09:10:30.157482image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
76439940
 
3.8%
73406820
 
3.5%
38381558
 
3.3%
34381510
 
3.3%
66375164
 
3.2%
67344822
 
3.0%
69315666
 
2.7%
50287218
 
2.5%
60267212
 
2.3%
15244581
 
2.1%
Other values (69)8239577
70.5%
ValueCountFrequency (%)
1205824
1.8%
2157945
1.4%
317988
 
0.2%
4113137
1.0%
560499
 
0.5%
6173943
1.5%
7162033
1.4%
8121033
1.0%
9171195
1.5%
10207668
1.8%
ValueCountFrequency (%)
79166902
 
1.4%
7892875
 
0.8%
77117887
 
1.0%
76439940
3.8%
75127344
 
1.1%
74173253
 
1.5%
73406820
3.5%
72154523
 
1.3%
71157970
 
1.4%
7077512
 
0.7%

Brand
Real number (ℝ≥0)

HIGH CORRELATION

Distinct8005
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean11971.51603
Minimum58
Maximum90090
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size178.3 MiB
2022-10-05T09:10:30.331924image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum58
5-th percentile1376
Q13679
median6216
Q316656
95-th percentile40337
Maximum90090
Range90032
Interquartile range (IQR)12977

Descriptive statistics

Standard deviation12398.43042
Coefficient of variation (CV)1.035660846
Kurtosis0.6925634103
Mean11971.51603
Median Absolute Deviation (MAD)2972
Skewness1.38949857
Sum1.398760073 × 1011
Variance153721076.8
MonotonicityNot monotonic
2022-10-05T09:10:30.488880image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
360621899
 
0.2%
811121556
 
0.2%
511120952
 
0.2%
189220745
 
0.2%
415719175
 
0.2%
413519095
 
0.2%
389618760
 
0.2%
806818755
 
0.2%
212018608
 
0.2%
383718534
 
0.2%
Other values (7995)11485989
98.3%
ValueCountFrequency (%)
582146
 
< 0.1%
60821
 
< 0.1%
6116
 
< 0.1%
622344
 
< 0.1%
632174
 
< 0.1%
72336
 
< 0.1%
758
 
< 0.1%
776197
0.1%
794139
< 0.1%
8215
 
< 0.1%
ValueCountFrequency (%)
900906
 
< 0.1%
9008942
< 0.1%
9008818
< 0.1%
9008714
 
< 0.1%
9008624
< 0.1%
900856
 
< 0.1%
900844
 
< 0.1%
9008216
 
< 0.1%
9008111
 
< 0.1%
900806
 
< 0.1%

Description
Categorical

HIGH CARDINALITY

Distinct7215
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size958.9 MiB
Jack Daniels No 7 Black
 
77657
Jagermeister Liqueur
 
74087
Capt Morgan Spiced Rum
 
73056
Tito's Handmade Vodka
 
70683
Smirnoff 80 Proof
 
70310
Other values (7210)
11318275 

Length

Max length28
Median length22
Mean length21.05547648
Min length6

Characters and Unicode

Total characters246013619
Distinct characters75
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique290 ?
Unique (%)< 0.1%

Sample

1st rowJim Beam w/2 Rocks Glasses
2nd rowJim Beam w/2 Rocks Glasses
3rd rowJim Beam w/2 Rocks Glasses
4th rowJim Beam w/2 Rocks Glasses
5th rowJim Beam w/2 Rocks Glasses

Common Values

ValueCountFrequency (%)
Jack Daniels No 7 Black77657
 
0.7%
Jagermeister Liqueur74087
 
0.6%
Capt Morgan Spiced Rum73056
 
0.6%
Tito's Handmade Vodka70683
 
0.6%
Smirnoff 80 Proof70310
 
0.6%
Bacardi Superior Rum70010
 
0.6%
Kahlua69937
 
0.6%
Jim Beam68003
 
0.6%
Absolut 80 Proof67528
 
0.6%
Jose Cuervo Especial61128
 
0.5%
Other values (7205)10981669
94.0%

Length

2022-10-05T09:10:30.657819image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
vodka1277087
 
3.1%
svgn871713
 
2.1%
chard664559
 
1.6%
rum647514
 
1.6%
cab641638
 
1.6%
pnt630297
 
1.5%
cal399372
 
1.0%
black386037
 
0.9%
smirnoff379514
 
0.9%
red374024
 
0.9%
Other values (6640)34788489
84.7%

Most occurring characters

ValueCountFrequency (%)
29377457
 
11.9%
a20177049
 
8.2%
e18241330
 
7.4%
r15332560
 
6.2%
o15043223
 
6.1%
n13110019
 
5.3%
i12644026
 
5.1%
l11560673
 
4.7%
t8011308
 
3.3%
s7450879
 
3.0%
Other values (65)95065095
38.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter172494614
70.1%
Uppercase Letter41272374
 
16.8%
Space Separator29377457
 
11.9%
Decimal Number1999725
 
0.8%
Other Punctuation746980
 
0.3%
Dash Punctuation99634
 
< 0.1%
Math Symbol21787
 
< 0.1%
Open Punctuation524
 
< 0.1%
Close Punctuation524
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a20177049
11.7%
e18241330
10.6%
r15332560
 
8.9%
o15043223
 
8.7%
n13110019
 
7.6%
i12644026
 
7.3%
l11560673
 
6.7%
t8011308
 
4.6%
s7450879
 
4.3%
d7013627
 
4.1%
Other values (16)43909920
25.5%
Uppercase Letter
ValueCountFrequency (%)
C5991533
14.5%
S4780870
11.6%
B3871089
 
9.4%
R3226514
 
7.8%
M2821845
 
6.8%
V2535173
 
6.1%
P2342603
 
5.7%
G1849123
 
4.5%
T1820126
 
4.4%
D1427876
 
3.5%
Other values (16)10605622
25.7%
Decimal Number
ValueCountFrequency (%)
0508467
25.4%
1463174
23.2%
8301000
15.1%
7174757
 
8.7%
2147531
 
7.4%
5133339
 
6.7%
490702
 
4.5%
984469
 
4.2%
362441
 
3.1%
633845
 
1.7%
Other Punctuation
ValueCountFrequency (%)
'387986
51.9%
&195035
26.1%
/148838
 
19.9%
#5508
 
0.7%
*4233
 
0.6%
.3588
 
0.5%
!1750
 
0.2%
%42
 
< 0.1%
Space Separator
ValueCountFrequency (%)
29377457
100.0%
Dash Punctuation
ValueCountFrequency (%)
-99634
100.0%
Math Symbol
ValueCountFrequency (%)
+21787
100.0%
Open Punctuation
ValueCountFrequency (%)
(524
100.0%
Close Punctuation
ValueCountFrequency (%)
)524
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin213766988
86.9%
Common32246631
 
13.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
a20177049
 
9.4%
e18241330
 
8.5%
r15332560
 
7.2%
o15043223
 
7.0%
n13110019
 
6.1%
i12644026
 
5.9%
l11560673
 
5.4%
t8011308
 
3.7%
s7450879
 
3.5%
d7013627
 
3.3%
Other values (42)85182294
39.8%
Common
ValueCountFrequency (%)
29377457
91.1%
0508467
 
1.6%
1463174
 
1.4%
'387986
 
1.2%
8301000
 
0.9%
&195035
 
0.6%
7174757
 
0.5%
/148838
 
0.5%
2147531
 
0.5%
5133339
 
0.4%
Other values (13)409047
 
1.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII246013619
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
29377457
 
11.9%
a20177049
 
8.2%
e18241330
 
7.4%
r15332560
 
6.2%
o15043223
 
6.1%
n13110019
 
5.3%
i12644026
 
5.1%
l11560673
 
4.7%
t8011308
 
3.3%
s7450879
 
3.0%
Other values (65)95065095
38.6%

Size
Categorical

HIGH CORRELATION

Distinct40
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size777.8 MiB
750mL
6661610 
1.75L
1976333 
50mL
1031306 
1.5L
734451 
375mL
 
595318
Other values (35)
685050 

Length

Max length10
Median length5
Mean length4.804751821
Min length2

Characters and Unicode

Total characters56139047
Distinct characters22
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row750mL
2nd row750mL
3rd row750mL
4th row750mL
5th row750mL

Common Values

ValueCountFrequency (%)
750mL6661610
57.0%
1.75L1976333
 
16.9%
50mL1031306
 
8.8%
1.5L734451
 
6.3%
375mL595318
 
5.1%
Liter207512
 
1.8%
3L153229
 
1.3%
5L130108
 
1.1%
187mL 4 Pk37100
 
0.3%
500mL27898
 
0.2%
Other values (30)129203
 
1.1%

Length

2022-10-05T09:10:30.821367image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
750ml6663894
56.3%
1.75l1976333
 
16.7%
50ml1048105
 
8.8%
1.5l734451
 
6.2%
375ml595987
 
5.0%
liter207512
 
1.8%
3l153229
 
1.3%
5l130108
 
1.1%
pk78363
 
0.7%
463483
 
0.5%
Other values (21)193194
 
1.6%

Most occurring characters

ValueCountFrequency (%)
L11681553
20.8%
511195882
19.9%
79292961
16.6%
m8463241
15.1%
07901892
14.1%
12797613
 
5.0%
.2713299
 
4.8%
3764206
 
1.4%
e207512
 
0.4%
r207512
 
0.4%
Other values (12)913376
 
1.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number32125476
57.2%
Uppercase Letter11762431
 
21.0%
Lowercase Letter9374167
 
16.7%
Other Punctuation2715707
 
4.8%
Space Separator160591
 
0.3%
Math Symbol675
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
511195882
34.9%
79292961
28.9%
07901892
24.6%
12797613
 
8.7%
3764206
 
2.4%
480062
 
0.2%
855657
 
0.2%
237203
 
0.1%
Lowercase Letter
ValueCountFrequency (%)
m8463241
90.3%
e207512
 
2.2%
r207512
 
2.2%
t207512
 
2.2%
i207512
 
2.2%
k78363
 
0.8%
z2515
 
< 0.1%
Uppercase Letter
ValueCountFrequency (%)
L11681553
99.3%
P78363
 
0.7%
O2515
 
< 0.1%
Other Punctuation
ValueCountFrequency (%)
.2713299
99.9%
/2408
 
0.1%
Space Separator
ValueCountFrequency (%)
160591
100.0%
Math Symbol
ValueCountFrequency (%)
+675
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common35002449
62.3%
Latin21136598
37.7%

Most frequent character per script

Common
ValueCountFrequency (%)
511195882
32.0%
79292961
26.5%
07901892
22.6%
12797613
 
8.0%
.2713299
 
7.8%
3764206
 
2.2%
160591
 
0.5%
480062
 
0.2%
855657
 
0.2%
237203
 
0.1%
Other values (2)3083
 
< 0.1%
Latin
ValueCountFrequency (%)
L11681553
55.3%
m8463241
40.0%
e207512
 
1.0%
r207512
 
1.0%
t207512
 
1.0%
i207512
 
1.0%
P78363
 
0.4%
k78363
 
0.4%
O2515
 
< 0.1%
z2515
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII56139047
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
L11681553
20.8%
511195882
19.9%
79292961
16.6%
m8463241
15.1%
07901892
14.1%
12797613
 
5.0%
.2713299
 
4.8%
3764206
 
1.4%
e207512
 
0.4%
r207512
 
0.4%
Other values (12)913376
 
1.6%

SalesQuantity
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED

Distinct351
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.558967048
Minimum1
Maximum1231
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size178.3 MiB
2022-10-05T09:10:31.003514image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q32
95-th percentile8
Maximum1231
Range1230
Interquartile range (IQR)1

Descriptive statistics

Standard deviation4.494622283
Coefficient of variation (CV)1.756420539
Kurtosis2770.417282
Mean2.558967048
Median Absolute Deviation (MAD)0
Skewness26.01507302
Sum29899145
Variance20.20162946
MonotonicityNot monotonic
2022-10-05T09:10:31.161372image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
16395518
54.7%
22390305
 
20.5%
3990205
 
8.5%
4573608
 
4.9%
5305510
 
2.6%
6262681
 
2.2%
7143383
 
1.2%
8106698
 
0.9%
1282691
 
0.7%
973134
 
0.6%
Other values (341)360335
 
3.1%
ValueCountFrequency (%)
16395518
54.7%
22390305
 
20.5%
3990205
 
8.5%
4573608
 
4.9%
5305510
 
2.6%
6262681
 
2.2%
7143383
 
1.2%
8106698
 
0.9%
973134
 
0.6%
1060832
 
0.5%
ValueCountFrequency (%)
12311
< 0.1%
11761
< 0.1%
10161
< 0.1%
10101
< 0.1%
8071
< 0.1%
7212
< 0.1%
7202
< 0.1%
7081
< 0.1%
6971
< 0.1%
6861
< 0.1%

SalesDollars
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED

Distinct9828
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean35.77872238
Minimum0
Maximum26061.14
Zeros32
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size178.3 MiB
2022-10-05T09:10:31.325241image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile4.95
Q110.99
median18.99
Q335.96
95-th percentile109.89
Maximum26061.14
Range26061.14
Interquartile range (IQR)24.97

Descriptive statistics

Standard deviation89.06332056
Coefficient of variation (CV)2.489281747
Kurtosis4128.714697
Mean35.77872238
Median Absolute Deviation (MAD)9
Skewness39.84989273
Sum418041025.3
Variance7932.275069
MonotonicityNot monotonic
2022-10-05T09:10:31.501774image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
9.99562822
 
4.8%
12.99403091
 
3.4%
10.99377270
 
3.2%
19.99303203
 
2.6%
11.99293380
 
2.5%
14.99283563
 
2.4%
8.99266572
 
2.3%
13.99263456
 
2.3%
16.99244674
 
2.1%
17.99242775
 
2.1%
Other values (9818)8443262
72.3%
ValueCountFrequency (%)
032
 
< 0.1%
0.492
 
< 0.1%
0.9990946
0.8%
1.294021
 
< 0.1%
1.499795
 
0.1%
1.791761
 
< 0.1%
1.9880510
0.7%
1.9960079
0.5%
2.297932
 
0.1%
2.493809
 
< 0.1%
ValueCountFrequency (%)
26061.141
< 0.1%
22994.031
< 0.1%
18805.051
< 0.1%
18552.931
< 0.1%
17697.051
< 0.1%
15599.881
< 0.1%
15276.241
< 0.1%
14487.241
< 0.1%
13335.531
< 0.1%
13279.971
< 0.1%

SalesPrice
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED

Distinct392
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean15.83307543
Minimum0
Maximum5799.99
Zeros32
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size178.3 MiB
2022-10-05T09:10:31.655916image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1.99
Q18.99
median12.99
Q319.99
95-th percentile37.99
Maximum5799.99
Range5799.99
Interquartile range (IQR)11

Descriptive statistics

Standard deviation14.5346574
Coefficient of variation (CV)0.917993315
Kurtosis10583.20284
Mean15.83307543
Median Absolute Deviation (MAD)5
Skewness42.4717086
Sum184994729.9
Variance211.2562657
MonotonicityNot monotonic
2022-10-05T09:10:31.832093image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
9.991047663
 
9.0%
12.99669786
 
5.7%
10.99645709
 
5.5%
8.99520170
 
4.5%
11.99494928
 
4.2%
0.99487701
 
4.2%
19.99477498
 
4.1%
14.99466267
 
4.0%
7.99439818
 
3.8%
13.99430703
 
3.7%
Other values (382)6003825
51.4%
ValueCountFrequency (%)
032
 
< 0.1%
0.495
 
< 0.1%
0.99487701
4.2%
1.2921918
 
0.2%
1.4958134
 
0.5%
1.795497
 
< 0.1%
1.99242915
2.1%
2.2925946
 
0.2%
2.4910033
 
0.1%
2.799171
 
0.1%
ValueCountFrequency (%)
5799.991
 
< 0.1%
4999.992
 
< 0.1%
4696.993
 
< 0.1%
3499.991
 
< 0.1%
3399.991
 
< 0.1%
3049.9910
< 0.1%
2999.993
 
< 0.1%
2699.992
 
< 0.1%
2199.993
 
< 0.1%
1999.993
 
< 0.1%
Distinct364
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size178.3 MiB
Minimum2016-01-01 00:00:00
Maximum2016-12-31 00:00:00
2022-10-05T09:10:32.001205image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-05T09:10:32.153642image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

Volume
Real number (ℝ≥0)

HIGH CORRELATION

Distinct22
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean963.475969
Minimum50
Maximum20000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size178.3 MiB
2022-10-05T09:10:32.315551image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum50
5-th percentile50
Q1750
median750
Q31500
95-th percentile1750
Maximum20000
Range19950
Interquartile range (IQR)750

Descriptive statistics

Standard deviation707.1605581
Coefficient of variation (CV)0.7339680292
Kurtosis13.68178308
Mean963.475969
Median Absolute Deviation (MAD)0
Skewness2.622212629
Sum1.125731874 × 1010
Variance500076.055
MonotonicityNot monotonic
2022-10-05T09:10:32.438356image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=22)
ValueCountFrequency (%)
7506663894
57.0%
17501976333
 
16.9%
501053037
 
9.0%
1500734451
 
6.3%
375595987
 
5.1%
1000207512
 
1.8%
3000153229
 
1.3%
5000130108
 
1.1%
18754959
 
0.5%
10029415
 
0.3%
Other values (12)85143
 
0.7%
ValueCountFrequency (%)
501053037
9.0%
10029415
 
0.3%
1502499
 
< 0.1%
180599
 
< 0.1%
18754959
 
0.5%
20023227
 
0.2%
2504584
 
< 0.1%
3006538
 
0.1%
3301315
 
< 0.1%
375595987
5.1%
ValueCountFrequency (%)
200001
 
< 0.1%
1800099
 
< 0.1%
5000130108
 
1.1%
400016579
 
0.1%
3000153229
 
1.3%
17501976333
 
16.9%
1500734451
 
6.3%
1000207512
 
1.8%
7506663894
57.0%
7201788
 
< 0.1%

Classification
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size735.4 MiB
1
6939639 
2
4744429 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters11684068
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
16939639
59.4%
24744429
40.6%

Length

2022-10-05T09:10:32.581387image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-10-05T09:10:32.753547image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
16939639
59.4%
24744429
40.6%

Most occurring characters

ValueCountFrequency (%)
16939639
59.4%
24744429
40.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number11684068
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
16939639
59.4%
24744429
40.6%

Most occurring scripts

ValueCountFrequency (%)
Common11684068
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
16939639
59.4%
24744429
40.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII11684068
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
16939639
59.4%
24744429
40.6%

ExciseTax
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED

Distinct1132
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.535389381
Minimum0.01
Maximum1260.52
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size178.3 MiB
2022-10-05T09:10:32.879556image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0.01
5-th percentile0.1
Q10.22
median0.79
Q31.57
95-th percentile5.51
Maximum1260.52
Range1260.51
Interquartile range (IQR)1.35

Descriptive statistics

Standard deviation4.706324094
Coefficient of variation (CV)3.06523163
Kurtosis2355.142099
Mean1.535389381
Median Absolute Deviation (MAD)0.68
Skewness29.34954166
Sum17939593.93
Variance22.14948648
MonotonicityNot monotonic
2022-10-05T09:10:33.033076image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.792210129
18.9%
0.111937494
16.6%
0.221145104
 
9.8%
1.841020893
 
8.7%
1.57623400
 
5.3%
0.45440911
 
3.8%
3.67393659
 
3.4%
0.39352851
 
3.0%
0.34311980
 
2.7%
0.05263297
 
2.3%
Other values (1122)2984350
25.5%
ValueCountFrequency (%)
0.01672
 
< 0.1%
0.02711
 
< 0.1%
0.0329328
 
0.3%
0.044163
 
< 0.1%
0.05263297
2.3%
0.0641293
 
0.4%
0.0736
 
< 0.1%
0.0819927
 
0.2%
0.09896
 
< 0.1%
0.1229023
2.0%
ValueCountFrequency (%)
1260.521
< 0.1%
909.561
< 0.1%
790.121
< 0.1%
731.851
< 0.1%
731.321
< 0.1%
687.221
< 0.1%
676.21
< 0.1%
670.691
< 0.1%
663.341
< 0.1%
661.51
< 0.1%

VendorNo
Real number (ℝ≥0)

Distinct117
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean7043.213783
Minimum2
Maximum173357
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size178.3 MiB
2022-10-05T09:10:33.199645image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile516
Q13252
median4425
Q39552
95-th percentile17035
Maximum173357
Range173355
Interquartile range (IQR)6300

Descriptive statistics

Standard deviation8419.868134
Coefficient of variation (CV)1.195458266
Kurtosis71.01064229
Mean7043.213783
Median Absolute Deviation (MAD)3297
Skewness7.248125247
Sum8.229338878 × 1010
Variance70894179.4
MonotonicityNot monotonic
2022-10-05T09:10:33.347428image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
39601393773
 
11.9%
125461036396
 
8.9%
1392826097
 
7.1%
4425821478
 
7.0%
3252696401
 
6.0%
17035588848
 
5.0%
480514597
 
4.4%
9552504700
 
4.3%
8004455740
 
3.9%
9165411105
 
3.5%
Other values (107)4434933
38.0%
ValueCountFrequency (%)
26
 
< 0.1%
6032
 
< 0.1%
105255
 
< 0.1%
2003
 
< 0.1%
28790
 
< 0.1%
3881262
 
< 0.1%
480514597
4.4%
516104065
 
0.9%
65357755
 
0.5%
660203919
 
1.7%
ValueCountFrequency (%)
173357296
 
< 0.1%
984506813
0.1%
900587094
0.1%
90057506
 
< 0.1%
900569824
0.1%
900531592
 
< 0.1%
90052395
 
< 0.1%
90051568
 
< 0.1%
900479175
0.1%
900461838
 
< 0.1%

VendorName
Categorical

HIGH CARDINALITY

Distinct118
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1020.1 MiB
DIAGEO NORTH AMERICA INC
1393773 
JIM BEAM BRANDS COMPANY
1036396 
CONSTELLATION BRANDS INC
826097 
MARTIGNETTI COMPANIES
819715 
E & J GALLO WINERY
696401 
Other values (113)
6911686 

Length

Max length39
Median length27
Mean length26.55049337
Min length10

Characters and Unicode

Total characters310217770
Distinct characters46
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4 ?
Unique (%)< 0.1%

Sample

1st rowJIM BEAM BRANDS COMPANY
2nd rowJIM BEAM BRANDS COMPANY
3rd rowJIM BEAM BRANDS COMPANY
4th rowJIM BEAM BRANDS COMPANY
5th rowJIM BEAM BRANDS COMPANY

Common Values

ValueCountFrequency (%)
DIAGEO NORTH AMERICA INC 1393773
 
11.9%
JIM BEAM BRANDS COMPANY 1036396
 
8.9%
CONSTELLATION BRANDS INC 826097
 
7.1%
MARTIGNETTI COMPANIES819715
 
7.0%
E & J GALLO WINERY 696401
 
6.0%
PERNOD RICARD USA 588848
 
5.0%
BACARDI USA INC 514597
 
4.4%
M S WALKER INC 504700
 
4.3%
SAZERAC CO INC 455740
 
3.9%
ULTRA BEVERAGE COMPANY LLP 411105
 
3.5%
Other values (108)4436696
38.0%

Length

2022-10-05T09:10:33.498656image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
inc5382025
 
13.5%
brands1983044
 
5.0%
america1772741
 
4.4%
north1597692
 
4.0%
company1470434
 
3.7%
diageo1465801
 
3.7%
usa1371131
 
3.4%
1112919
 
2.8%
jim1036396
 
2.6%
beam1036396
 
2.6%
Other values (207)21636577
54.3%

Most occurring characters

ValueCountFrequency (%)
101087414
32.6%
A23051925
 
7.4%
I21244386
 
6.8%
N20353497
 
6.6%
E17957587
 
5.8%
R16670846
 
5.4%
C15077074
 
4.9%
O13207984
 
4.3%
S11339871
 
3.7%
T11124336
 
3.6%
Other values (36)59102850
19.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter206625266
66.6%
Space Separator101087414
32.6%
Other Punctuation1790317
 
0.6%
Dash Punctuation341366
 
0.1%
Lowercase Letter181029
 
0.1%
Open Punctuation96189
 
< 0.1%
Close Punctuation96189
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A23051925
11.2%
I21244386
10.3%
N20353497
9.9%
E17957587
 
8.7%
R16670846
 
8.1%
C15077074
 
7.3%
O13207984
 
6.4%
S11339871
 
5.5%
T11124336
 
5.4%
M9503790
 
4.6%
Other values (16)47093970
22.8%
Lowercase Letter
ValueCountFrequency (%)
a38634
21.3%
s26623
14.7%
r20811
11.5%
d18597
10.3%
e14612
 
8.1%
n14359
 
7.9%
l13626
 
7.5%
i7010
 
3.9%
u6586
 
3.6%
o6586
 
3.6%
Other values (3)13585
 
7.5%
Other Punctuation
ValueCountFrequency (%)
&1112919
62.2%
.575615
32.2%
,101783
 
5.7%
Space Separator
ValueCountFrequency (%)
101087414
100.0%
Dash Punctuation
ValueCountFrequency (%)
-341366
100.0%
Open Punctuation
ValueCountFrequency (%)
(96189
100.0%
Close Punctuation
ValueCountFrequency (%)
)96189
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin206806295
66.7%
Common103411475
33.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
A23051925
11.1%
I21244386
10.3%
N20353497
9.8%
E17957587
 
8.7%
R16670846
 
8.1%
C15077074
 
7.3%
O13207984
 
6.4%
S11339871
 
5.5%
T11124336
 
5.4%
M9503790
 
4.6%
Other values (29)47274999
22.9%
Common
ValueCountFrequency (%)
101087414
97.8%
&1112919
 
1.1%
.575615
 
0.6%
-341366
 
0.3%
,101783
 
0.1%
(96189
 
0.1%
)96189
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII310217770
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
101087414
32.6%
A23051925
 
7.4%
I21244386
 
6.8%
N20353497
 
6.6%
E17957587
 
5.8%
R16670846
 
5.4%
C15077074
 
4.9%
O13207984
 
4.3%
S11339871
 
3.7%
T11124336
 
3.6%
Other values (36)59102850
19.1%

City
Categorical

HIGH CARDINALITY
HIGH CORRELATION

Distinct67
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size822.3 MiB
MOUNTMEND
884318 
DONCASTER
846760 
EANVERNESS
833123 
GOULCREST
 
555501
HORNSEY
 
521485
Other values (62)
8042881 

Length

Max length13
Median length12
Mean length8.796688619
Min length4

Characters and Unicode

Total characters102781108
Distinct characters24
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowHARDERSFIELD
2nd rowHARDERSFIELD
3rd rowHARDERSFIELD
4th rowHARDERSFIELD
5th rowHARDERSFIELD

Common Values

ValueCountFrequency (%)
MOUNTMEND884318
 
7.6%
DONCASTER846760
 
7.2%
EANVERNESS833123
 
7.1%
GOULCREST555501
 
4.8%
HORNSEY521485
 
4.5%
PITMERDEN381510
 
3.3%
HARDERSFIELD360347
 
3.1%
IRRAGIN267212
 
2.3%
LARNWICK263285
 
2.3%
WANBORNE244581
 
2.1%
Other values (57)6525946
55.9%

Length

2022-10-05T09:10:33.628786image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
mountmend884318
 
7.2%
doncaster846760
 
6.9%
eanverness833123
 
6.8%
goulcrest555501
 
4.5%
hornsey521485
 
4.3%
pitmerden381510
 
3.1%
hardersfield360347
 
2.9%
irragin267212
 
2.2%
larnwick263285
 
2.1%
wanborne244581
 
2.0%
Other values (62)7106595
57.9%

Most occurring characters

ValueCountFrequency (%)
E12492906
12.2%
R10407731
 
10.1%
N9303426
 
9.1%
A8837376
 
8.6%
O6906316
 
6.7%
L6549263
 
6.4%
S6497537
 
6.3%
T5722509
 
5.6%
D4414554
 
4.3%
C4182380
 
4.1%
Other values (14)27467110
26.7%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter101943978
99.2%
Space Separator580649
 
0.6%
Other Punctuation256481
 
0.2%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
E12492906
12.3%
R10407731
 
10.2%
N9303426
 
9.1%
A8837376
 
8.7%
O6906316
 
6.8%
L6549263
 
6.4%
S6497537
 
6.4%
T5722509
 
5.6%
D4414554
 
4.3%
C4182380
 
4.1%
Other values (12)26629980
26.1%
Space Separator
ValueCountFrequency (%)
580649
100.0%
Other Punctuation
ValueCountFrequency (%)
'256481
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin101943978
99.2%
Common837130
 
0.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
E12492906
12.3%
R10407731
 
10.2%
N9303426
 
9.1%
A8837376
 
8.7%
O6906316
 
6.8%
L6549263
 
6.4%
S6497537
 
6.4%
T5722509
 
5.6%
D4414554
 
4.3%
C4182380
 
4.1%
Other values (12)26629980
26.1%
Common
ValueCountFrequency (%)
580649
69.4%
'256481
30.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII102781108
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
E12492906
12.2%
R10407731
 
10.1%
N9303426
 
9.1%
A8837376
 
8.6%
O6906316
 
6.7%
L6549263
 
6.4%
S6497537
 
6.3%
T5722509
 
5.6%
D4414554
 
4.3%
C4182380
 
4.1%
Other values (14)27467110
26.7%

SalesDateMonth
Real number (ℝ≥0)

Distinct12
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.672609232
Minimum1
Maximum12
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size178.3 MiB
2022-10-05T09:10:33.737845image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q14
median7
Q310
95-th percentile12
Maximum12
Range11
Interquartile range (IQR)6

Descriptive statistics

Standard deviation3.4488207
Coefficient of variation (CV)0.5168623818
Kurtosis-1.176147685
Mean6.672609232
Median Absolute Deviation (MAD)3
Skewness-0.05055144269
Sum77963220
Variance11.89436422
MonotonicityNot monotonic
2022-10-05T09:10:33.848899image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
121169178
10.0%
71102646
9.4%
81014932
8.7%
6986161
8.4%
5969355
8.3%
9966188
8.3%
10941689
8.1%
11932305
8.0%
1931294
8.0%
4900484
7.7%
Other values (2)1769836
15.1%
ValueCountFrequency (%)
1931294
8.0%
2879310
7.5%
3890526
7.6%
4900484
7.7%
5969355
8.3%
6986161
8.4%
71102646
9.4%
81014932
8.7%
9966188
8.3%
10941689
8.1%
ValueCountFrequency (%)
121169178
10.0%
11932305
8.0%
10941689
8.1%
9966188
8.3%
81014932
8.7%
71102646
9.4%
6986161
8.4%
5969355
8.3%
4900484
7.7%
3890526
7.6%

Interactions

2022-10-05T09:08:49.965159image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-05T09:04:42.122700image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-05T09:05:15.152907image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-05T09:05:45.381906image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-05T09:06:16.144890image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-05T09:06:46.407225image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-05T09:07:17.236707image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-05T09:07:47.899874image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-05T09:08:18.938023image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-05T09:08:53.418158image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-05T09:04:45.827342image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-05T09:05:18.386894image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-05T09:05:48.749995image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-05T09:06:19.462000image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-05T09:06:49.740714image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-05T09:07:20.704709image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-05T09:07:51.253875image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-05T09:08:22.756024image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-05T09:08:56.763160image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-05T09:04:49.576339image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-05T09:05:21.867893image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-05T09:05:51.817892image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-05T09:06:22.740893image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-05T09:06:53.116707image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-05T09:07:24.086853image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-05T09:07:54.569873image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-05T09:08:26.196024image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-05T09:09:00.476998image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-05T09:04:53.213350image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-05T09:05:25.252994image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-05T09:05:55.347902image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-05T09:06:26.149894image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-05T09:06:56.520730image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-05T09:07:27.597763image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-05T09:07:57.999978image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-05T09:08:29.542122image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-05T09:09:04.214032image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-05T09:04:57.067336image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-05T09:05:28.520892image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-05T09:05:58.765893image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-05T09:06:29.453007image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-05T09:06:59.852837image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-05T09:07:31.155709image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-05T09:08:01.604875image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-05T09:08:32.920021image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-05T09:09:07.855989image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-05T09:05:00.948335image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-05T09:05:31.899891image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-05T09:06:02.158890image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-05T09:06:32.786919image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-05T09:07:03.181711image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-05T09:07:34.347712image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-05T09:08:05.075879image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-05T09:08:36.360129image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-05T09:09:11.780989image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-05T09:05:04.786336image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-05T09:05:35.194894image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-05T09:06:05.495022image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-05T09:06:36.145992image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-05T09:07:06.627708image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-05T09:07:37.721742image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-05T09:08:08.305873image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-05T09:08:39.744025image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-05T09:09:15.634176image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-05T09:05:08.209335image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-05T09:05:38.698893image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-05T09:06:09.104893image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-05T09:06:39.452891image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-05T09:07:09.940712image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-05T09:07:41.104889image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-05T09:08:12.045023image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-05T09:08:42.888144image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-05T09:09:19.305171image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-05T09:05:11.554896image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-05T09:05:42.023895image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-05T09:06:12.724895image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-05T09:06:42.955796image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-05T09:07:13.724710image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-05T09:07:44.526873image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-05T09:08:15.472026image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-05T09:08:46.309026image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Correlations

2022-10-05T09:10:33.979819image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-10-05T09:10:34.177717image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-10-05T09:10:34.373729image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-10-05T09:10:34.630717image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-10-05T09:10:34.764712image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-10-05T09:09:35.294261image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
A simple visualization of nullity by column.
2022-10-05T09:09:53.519739image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

InventoryIdStoreBrandDescriptionSizeSalesQuantitySalesDollarsSalesPriceSalesDateVolumeClassificationExciseTaxVendorNoVendorNameCitySalesDateMonth
01_HARDERSFIELD_100411004Jim Beam w/2 Rocks Glasses750mL116.4916.492016-01-01750.010.7912546JIM BEAM BRANDS COMPANYHARDERSFIELD1
11_HARDERSFIELD_100411004Jim Beam w/2 Rocks Glasses750mL232.9816.492016-01-02750.011.5712546JIM BEAM BRANDS COMPANYHARDERSFIELD1
21_HARDERSFIELD_100411004Jim Beam w/2 Rocks Glasses750mL116.4916.492016-01-03750.010.7912546JIM BEAM BRANDS COMPANYHARDERSFIELD1
31_HARDERSFIELD_100411004Jim Beam w/2 Rocks Glasses750mL114.4914.492016-01-08750.010.7912546JIM BEAM BRANDS COMPANYHARDERSFIELD1
41_HARDERSFIELD_100411004Jim Beam w/2 Rocks Glasses750mL228.9814.492016-02-09750.011.5712546JIM BEAM BRANDS COMPANYHARDERSFIELD2
51_HARDERSFIELD_100411004Jim Beam w/2 Rocks Glasses750mL343.4714.492016-02-10750.012.3612546JIM BEAM BRANDS COMPANYHARDERSFIELD2
61_HARDERSFIELD_100411004Jim Beam w/2 Rocks Glasses750mL114.4914.492016-02-11750.010.7912546JIM BEAM BRANDS COMPANYHARDERSFIELD2
71_HARDERSFIELD_100411004Jim Beam w/2 Rocks Glasses750mL114.4914.492016-02-16750.010.7912546JIM BEAM BRANDS COMPANYHARDERSFIELD2
81_HARDERSFIELD_100411004Jim Beam w/2 Rocks Glasses750mL114.4914.492016-02-17750.010.7912546JIM BEAM BRANDS COMPANYHARDERSFIELD2
91_HARDERSFIELD_100411004Jim Beam w/2 Rocks Glasses750mL114.4914.492016-02-20750.010.7912546JIM BEAM BRANDS COMPANYHARDERSFIELD2

Last rows

InventoryIdStoreBrandDescriptionSizeSalesQuantitySalesDollarsSalesPriceSalesDateVolumeClassificationExciseTaxVendorNoVendorNameCitySalesDateMonth
116840587_STANMORE_225072250Jack Daniels Family 4 Pk/50m50mL 4 Pk229.9814.992016-12-1650.010.101128BROWN-FORMAN CORPSTANMORE12
116840597_STANMORE_225072250Jack Daniels Family 4 Pk/50m50mL 4 Pk459.9614.992016-12-1750.010.211128BROWN-FORMAN CORPSTANMORE12
116840607_STANMORE_225072250Jack Daniels Family 4 Pk/50m50mL 4 Pk114.9914.992016-12-1950.010.051128BROWN-FORMAN CORPSTANMORE12
116840617_STANMORE_225072250Jack Daniels Family 4 Pk/50m50mL 4 Pk229.9814.992016-12-2050.010.101128BROWN-FORMAN CORPSTANMORE12
116840627_STANMORE_225072250Jack Daniels Family 4 Pk/50m50mL 4 Pk344.9714.992016-12-2150.010.161128BROWN-FORMAN CORPSTANMORE12
116840637_STANMORE_225072250Jack Daniels Family 4 Pk/50m50mL 4 Pk229.9814.992016-12-2250.010.101128BROWN-FORMAN CORPSTANMORE12
116840648_ALNERWICK_36138836138Renzo Masi Chianti Rufina750mL221.9810.992016-12-27750.020.2210754PERFECTA WINESALNERWICK12
116840658_ALNERWICK_6528652Frangelico Gift Pack-Candle750mL121.9921.992016-12-21750.010.7911567CAMPARI AMERICAALNERWICK12
116840669_BLACKPOOL_320493204Twenty 2 Vodka750mL121.9921.992016-12-01750.010.797153PINE STATE TRADING COBLACKPOOL12
116840679_BLACKPOOL_320493204Twenty 2 Vodka750mL121.9921.992016-12-04750.010.797153PINE STATE TRADING COBLACKPOOL12